Handbook of Big Data Analytics and Forensics by Unknown

Handbook of Big Data Analytics and Forensics by Unknown

Author:Unknown
Language: eng
Format: epub
ISBN: 9783030747534
Publisher: Springer International Publishing


Aaruni Upadhyay (Corresponding author)

Email: [email protected]

Keywords

Fair clusteringK-medianFairnessMachine learningCyber-physical systemsInternet of things

1 Introduction

Recent years has witnessed a proliferation of using computerized system for majority aspects of our today’s life [1–5], which encouraged cybercriminals to attack these system by desining sophisticated attack patterns. The safety of our society and infrastructure depends on keeping our mission-critical systems such as Water distribution safe from cyber-attacks [6–10]. Many such systems work in tandem with the Internet of Things (IoT) systems and other cyber-physical systems that are susceptible to attacks by hostile nations and other non-state actors [11–15]. Machine learning is increasingly being used in designing systems that can detect such attacks through clustering which is an unsupervised machine learning technique [16–19].

The behavior of machine learning systems is dependent on the training data which may contain biases which may in return, result in the bias being reflected in the outcome [20]. This problem was highlighted by Chierichetti in [21] where they argue that the biases may still indirectly appear in results even if unprotected attributes (such as a person’s height) are used for making decisions instead of protected ones such as race and gender. This could happen because of the hidden correlations that may exist between protected and unprotected attributes, for example, average height (unprotected) is related to gender (protected) and can be exploited as a proxy for discrimination.

The established approach followed by the machine learning researchers to solve this problem can be traced back to the US Supreme Court case Griggs v. Duke Power Co. [22] that resulted in the emergence of the concept of adverse impact. Adverse impact occurs when a practice negatively and disproportionately affects a protected group regardless if it was indirectly or unintentionally. The “80% rule” was adopted by the researchers as a generally accepted way to measure adverse impacts which states that an adverse impact has occurred if “the selection rate for a certain group is less than 80 percent of that of the group with the highest selection rate” [23].

Chierichetti applied this notion of fairness to clustering by introducing the use of fairlets that groups together the datapoints while preserving the fairness objective. These fairlets are then combined to form clusters by using existing k-median algorithms. This way, fair clustering reduces biases by placing constraints on the clusters so that the probability of a class of input data points being present in a cluster, is strictly greater than zero. However fair clustering achieved using this method has a super-quadratic runtime. The paper we are basing our research on [24] presents a new implementation of this fair clustering method that runs in near-linear time and therefore offers performance that scales with the input size.

To formally outline the problem, we must first define fair clustering and we will use the same definition as our base paper. Consider n number of points P from the training dataset such that each point belongs to one of two types: T1 and T2. In a practical application, these classes can correspond to any legally protected attribute such as gender where T1: Male and T2: Female.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(73903)
What's Done in Darkness by Kayla Perrin(26954)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(20852)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(20605)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(20533)
The Fifty Shades Trilogy & Grey by E L James(19453)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19376)
Shot Through the Heart by Mercy Celeste(19234)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 10 by Isuna Hasekura and Jyuu Ayakura(17383)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17346)
Peren F. Statistics for Business and Economics...Essential Formulas 3ed 2025 by Unknown(17178)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 03 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(17092)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 01 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16709)
The Subtle Art of Not Giving a F*ck by Mark Manson(14822)
The 3rd Cycle of the Betrayed Series Collection: Extremely Controversial Historical Thrillers (Betrayed Series Boxed set) by McCray Carolyn(14439)
Stepbrother Stories 2 - 21 Taboo Story Collection (Brother Sister Stepbrother Stepsister Taboo Pseudo Incest Family Virgin Creampie Pregnant Forced Pregnancy Breeding) by Roxi Harding(14211)
Cozy crochet hats: 7 Stylish and Beginner-Friendly Patterns from Baby Beanies to Trendy Bucket Hats by Vanilla Lazy(13489)
Scorched Earth by Nick Kyme(13090)
Reichel W. Numerical methods for Electrical Engineering, Meteorology,...2022 by Unknown(12972)
Drei Generationen auf dem Jakobsweg by Stein Pia(11251)